321 research outputs found

    Continuation of Nesterov's Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging

    Full text link
    Predictive models can be used on high-dimensional brain images for diagnosis of a clinical condition. Spatial regularization through structured sparsity offers new perspectives in this context and reduces the risk of overfitting the model while providing interpretable neuroimaging signatures by forcing the solution to adhere to domain-specific constraints. Total Variation (TV) enforces spatial smoothness of the solution while segmenting predictive regions from the background. We consider the problem of minimizing the sum of a smooth convex loss, a non-smooth convex penalty (whose proximal operator is known) and a wide range of possible complex, non-smooth convex structured penalties such as TV or overlapping group Lasso. Existing solvers are either limited in the functions they can minimize or in their practical capacity to scale to high-dimensional imaging data. Nesterov's smoothing technique can be used to minimize a large number of non-smooth convex structured penalties but reasonable precision requires a small smoothing parameter, which slows down the convergence speed. To benefit from the versatility of Nesterov's smoothing technique, we propose a first order continuation algorithm, CONESTA, which automatically generates a sequence of decreasing smoothing parameters. The generated sequence maintains the optimal convergence speed towards any globally desired precision. Our main contributions are: To propose an expression of the duality gap to probe the current distance to the global optimum in order to adapt the smoothing parameter and the convergence speed. We provide a convergence rate, which is an improvement over classical proximal gradient smoothing methods. We demonstrate on both simulated and high-dimensional structural neuroimaging data that CONESTA significantly outperforms many state-of-the-art solvers in regard to convergence speed and precision.Comment: 11 pages, 6 figures, accepted in IEEE TMI, IEEE Transactions on Medical Imaging 201

    Analyse différentielle de puces à ADN. Comparaison entre méthodes wrapper et filter.

    Get PDF
    13Dans le cadre de données d'expression génétique, nous nous intéressons aux méthodes qui permettent d'identifier les gènes significativement différentiellement exprimés entre deux situations biologiques. Nous allons comparer une méthode classique d'analyse par tests d'hypothèses à des méthodes d'analyse différentielle par régression régularisée. La difficulté de ce genre de jeu de données est la profusion de variables (les gènes) pour assez peu d'individus (les profils d'expression). La stratégie usuelle consiste à mettre en oeuvre autant de tests qu'il y a de variables et de considérer que les variables principales sont celles qui ont la « meilleure »p-value. Une stratégie alternative pourrait consister à choisir de classer les variables non plus en fonction de leur significativité (pour un test), mais plutôt de le classer suivant leur poids dans le modèle régularisé obtenu. Dans la bibliographie, les premières méthodes sont dites filter1, les deuxièmes sont plutôt dites wrapper2. Un bon aperçu de ce que sont les méthodes wrapper et filter est donné dans [9]. Le cadre ressemble à celui de l'apprentissage supervisé, car on dispose de profils d'expression géniques pour si possible l'ensemble du génome d'un organisme, chaque puce appartenant à une classe- situation biologique particulière (par exemple malade vs sain). L'implémentation des méthodes évoquées dans ce rapport a été effectuée sous R [16]

    Grouping levels of exposure with same observable effects before class prediction in toxicogenomics.

    Get PDF
    International audienceGene expression profiling in toxicogenomics is often used to find molecular signature of toxicants. The range of doses chosen in toxicogenomics studies does not always represent all the possible effects on gene expression: several doses of toxicant can lead to the same observable effect on the transcriptome. This makes the problem of dose exposure prediction difficult to address. We propose a strategy allowing to gather the doses with similar effects prior to the computing of a molecular signature. The different gathering of doses are compared with criteria based on likelihood or Monte Carlo Cross Validation. The molecular signature is then determined via a voting algorithm. Experimental results point out that the obtained classifier has better prediction performances than the classifier computed according to the original labeling

    Simulated Data for Linear Regression with Structured and Sparse Penalties

    Get PDF
    A very active field of research in Bioinformatics is to integrate structure in Machine Learning methods. Methods recently developed claim that they allow simultaneously to link the computed model to the graphical structure of the data set and to select a handful of important features in the analysis. However, there is still no way to simulate data for which we can separate the three properties that such method claim to achieve. These properties are: (i) the sparsity of the solution, i.e., the fact the the model is based on a few features of the data; (ii) the structure of the model; (iii) the relation between the structure of the model and the graphical model behind the generation of the data

    The genetic architecture of language functional connectivity

    Get PDF
    Available online 18 December 2021Language is a unique trait of the human species, of which the genetic architecture remains largely unknown. Through language disorders studies, many candidate genes were identified. However, such complex and multi- factorial trait is unlikely to be driven by only few genes and case-control studies, suffering from a lack of power, struggle to uncover significant variants. In parallel, neuroimaging has significantly contributed to the under- standing of structural and functional aspects of language in the human brain and the recent availability of large scale cohorts like UK Biobank have made possible to study language via image-derived endophenotypes in the general population. Because of its strong relationship with task-based fMRI (tbfMRI) activations and its easiness of acquisition, resting-state functional MRI (rsfMRI) have been more popularised, making it a good surrogate of functional neuronal processes. Taking advantage of such a synergistic system by aggregating effects across spa- tially distributed traits, we performed a multivariate genome-wide association study (mvGWAS) between genetic variations and resting-state functional connectivity (FC) of classical brain language areas in the inferior frontal (pars opercularis, triangularis and orbitalis), temporal and inferior parietal lobes (angular and supramarginal gyri), in 32,186 participants from UK Biobank. Twenty genomic loci were found associated with language FCs, out of which three were replicated in an independent replication sample. A locus in 3p11.1, regulating EPHA3 gene expression, is found associated with FCs of the semantic component of the language network, while a lo- cus in 15q14, regulating THBS1 gene expression is found associated with FCs of the perceptual-motor language processing, bringing novel insights into the neurobiology of language.This research was conducted using the UK Biobank resource un- der application #64984. This project was supported by the Marie Sklodowska-Curie program awarded to Stephanie J. Forkel (Grant agree- ment No. 101028551). Amaia Carrion-Castillo was supported by a Juan de la Cierva fellowship from the Spanish Ministry of Science and Innova- tion, and a Gipuzkoa Fellows fellowship from the Basque Governmen

    Brainomics: A management system for exploring and merging heterogeneous brain mapping data

    Get PDF
    International audienceWe propose an open source solution to manage brain imaging datasets and associated meta data. This framework is a powerful querying and reporting tool, customized for the needs of the emerging imaging-genetics field. A demonstration website and more details are available at http:/brainomics.cea.fr

    Imaging genetics: bio-informatics and bio-statistics challenges

    Get PDF
    International audienceThe IMAGEN study -- a very large European Research Project -- seeks to identify and characterize biological and environmental factors that in uence teenagers mental health. To this aim, the consortium plans to collect data for more than 2000 subjects at 8 neuroimaging centres. These data comprise neuroimaging data, behavioral tests (for up to 5 hours of testing), and also white blood samples which are collected and processed to obtain 650k single nucleotide polymorphisms (SNP) per subject. Data for more than 1000 subjects have already been collected. We describe the statistical aspects of these data and the challenges, such as the multiple comparison problem, created by such a large imaging genetics study (i.e., 650k for the SNP, 50k data per neuroimage).We also suggest possible strategies, and present some rst investigations using uni or multi-variate methods in association with re-sampling techniques. Specically, because the number of variables is very high, we rst reduce the data size and then use multivariate (CCA, PLS) techniques in association with re-sampling techniques

    Anxiety onset in adolescents : a machine-learning prediction

    Get PDF
    Publisher Copyright: © 2022, The Author(s).Recent longitudinal studies in youth have reported MRI correlates of prospective anxiety symptoms during adolescence, a vulnerable period for the onset of anxiety disorders. However, their predictive value has not been established. Individual prediction through machine-learning algorithms might help bridge the gap to clinical relevance. A voting classifier with Random Forest, Support Vector Machine and Logistic Regression algorithms was used to evaluate the predictive pertinence of gray matter volumes of interest and psychometric scores in the detection of prospective clinical anxiety. Participants with clinical anxiety at age 18–23 (N = 156) were investigated at age 14 along with healthy controls (N = 424). Shapley values were extracted for in-depth interpretation of feature importance. Prospective prediction of pooled anxiety disorders relied mostly on psychometric features and achieved moderate performance (area under the receiver operating curve = 0.68), while generalized anxiety disorder (GAD) prediction achieved similar performance. MRI regional volumes did not improve the prediction performance of prospective pooled anxiety disorders with respect to psychometric features alone, but they improved the prediction performance of GAD, with the caudate and pallidum volumes being among the most contributing features. To conclude, in non-anxious 14 year old adolescents, future clinical anxiety onset 4–8 years later could be individually predicted. Psychometric features such as neuroticism, hopelessness and emotional symptoms were the main contributors to pooled anxiety disorders prediction. Neuroanatomical data, such as caudate and pallidum volume, proved valuable for GAD and should be included in prospective clinical anxiety prediction in adolescents.Peer reviewe

    Brainomics: Harnessing the CubicWeb semantic framework to manage large neuromaging genetics shared resources

    Get PDF
    National audienceIn neurosciences or psychiatry, large mul-ticentric population studies are being acquired and the corresponding data are made available to the acquisition partners or the scientific community. The massive, heterogeneous and complex data from genetics, imaging , demographics or scores rely on ontologies for their definition, sharing and access. These data must be efficiently queriable by the end user and the database operator. We present the tools based on the CubicWeb open-source framework that serve the data of the european projects IMAGEN and EU-AIMS

    Enhancing the Reproducibility of Group Analysis with Randomized Brain Parcellations

    Get PDF
    International audienceNeuroimaging group analyses are used to compare the inter-subject variability observed in brain organization with behavioural or genetic variables and to assess risks factors of brain diseases. The lack of stability and of sensitivity of current voxel-based analysis schemes may however lead to non-reproducible results. A new approach is introduced to overcome the limitations of standard methods, in which active voxels are detected according to a consensus on several random parcellations of the brain images, while a permutation test controls the false positive risk. Both on syntetic and real data, this approach shows higher sensitivity, better recovery and higher reproducibility than standard methods and succeeds in detecting a significant association in an imaging-genetic study between a genetic variant next to the COMT gene and a region in the left thalamus on a functional Magnetic Resonance Imaging contrast
    corecore